Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input

In this paper, we explore neural network models that learn to associate segments of spoken audio captions with the semantically relevant portions of natural images that they refer to. We demonstrate that these audio-visual associative localizations emerge from network-internal representations learned as a by-product of training to perform an image-audio retrieval task. Our models operate direct...

متن کامل

Learning Spoken Words from Multisensory Input

Speech recognition and speech translation are traditionally addressed by processing acoustic signals while nonlinguistic information is typically not used. In this paper, we present a new method which explores the spoken word learning from naturally co-occurring multisensory information in a dyadic(two-person) conversation. It has been noticed that the listener always has a strong tendency to l...

متن کامل

Interference of spoken word recognition through phonological priming from visual objects and printed words.

Three cross-modal priming experiments examined the influence of preexposure to pictures and printed words on the speed of spoken word recognition. Targets for auditory lexical decision were spoken Dutch words and nonwords, presented in isolation (Experiments 1 and 2) or after a short phrase (Experiment 3). Auditory stimuli were preceded by primes, which were pictures (Experiments 1 and 3) or th...

متن کامل

Learning words from natural audio-visual input

We present a model of early word learning which learns from natural audio and visual input. The model has been successfully implemented to learn words and their audio-visual grounding from camera and microphone input. Although simple in its current form, this model is a rst step towards a more complete, fully-grounded model of language acquisition. Practical applications include adaptive human-...

متن کامل

Differences between written and spoken input in learning new words

We trained adult learners the meanings of rare words to test hypotheses about modality effects in learning word forms. These hypotheses are that (1) written (orthographic) training leads to a better representation of word form than phonological training, that (2) recognition memory for a word is partly dependent upon congruence between training and testing modality (written vs. spoken) but that...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Vision

سال: 2019

ISSN: 0920-5691,1573-1405

DOI: 10.1007/s11263-019-01205-0